A general dynamical statistical model with 
possible causal interpretation 

Daniel Commenges 1 ' 2 and Anne Gegout-Petit 2 ' 3 
February 2, 2008 

1 INSERM, U 875, Bordeaux, F33076, France 

2 Universite Victor Segalen Bordeaux 2, Bordeaux, F33076, France 

3 1MB, UMR 5251, Faience, F33405, France 

Summary. We develop a general dynamical model as a framework for 
possible causal interpretation. We first state a criterion of local independence 
in terms of measurability of processes involved in the Doob-Meyer decom- 
position of stochastic processes, as in Aalen (1987); then we define direct 
and indirect influence. We propose a definition of causal influence using the 
concepts of "physical system". This framework makes it possible to link 
descriptive and explicative statistical models, and encompasses quantitative 
processes and events. One of the features of this paper is the clear distinc- 
tion between the model for the system and the model for the observation. 
We give a dynamical representation of a conventional joint model for HIV 
load and CD4 counts. We show its inadequacy to capture causal influences 
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while on the contrary known mechanisms of HIV infection can be expressed 
directly through a system of differential equations. 

Keywords: Causality; causal influence; differential equations; directed 
graphs; dynamical models; HIV; randomisation; stochastic processes. 

1 Introduction 

The issue of causality has been studied by many philosophers since Aristo- 
tle and is of central importance in all branches of science (see Bunge, 1979 
and Salmon, 1984). A central question for scientists who use statistics and 
for statisticians is whether statistical models may help in deciphering causal 
links. After recognising that correlation is not causation, scientists have 
tended to use statistical methods as one element among others to help estab- 
lish causal links. Epidemiologists are particularly cautious, and with good 
reason, in concluding to causal influences. There has been however a growing 
interest in developing statistical models able to represent causal influences. 
From the beginning, graphs have played an important role in representing 
the set of causal influences. The pioneering work of Wright (1921, 1934) 
have inspired the more recent developments of structural equation models 
(Joreskog, 1978) and graphical models (Dawid, 1979; Lauritzen and Wer- 
muth, 1989; Cox and Wermuth, 1996). An approach using the modelling 
of "potential outcome", often called the counterfactual approach, has been 
proposed in the context of clinical trials by Rubin (1974) and further stud- 
ied by Holland (1986) among others. The counterfactual approach has been 
extended to the study of longitudinal incomplete data in several papers, the 
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results of which have been gathered together by van der Laan and Robins 
(2002). Spirtes, Glymour and Sheines (2000) and Pearl (2000) develop the 
issue of investigating causality with graphical models. 

The counterfactual approach however has been criticised (Dawid, 2000; 
Geneletti, 2007) and the modelling of potential outcomes raises difficulties 
when treating truly dynamical problems. In fact another school tackles 
causality by directly using dynamical models. This approach started in the 
econometrics literature with Granger (1969) and Schweder (1970) and was 
more recently developed by several Scandinavian statisticians using the for- 
malism of stochastic processes, and in particular of counting processes (for 
a review see Eerola, 1994; Aalen and Frigessi, 2007). Of particular interest 
is the paper by Aalen (1987) which outlines a general approach for defining 
influences for stochastic processes through the Doob-Meyer decomposition. 
The most recent developments of the dynamical approach are the method of 
"dynamic path analysis" of Fosen et al. (2006) and the study of the possibly 
cyclic directed graphs associated with this definition of influence by Didelez 
(2007). Defining influence in the stochastic process framework does not en- 
sure that we make relevant causal inference but we believe that it provides 
a better formalism for tackling this issue than approaches which deal only 
with random variables. 

The aim of this paper is to develop the dynamical approach in a general 
framework, focusing in particular on causal interpretation, using the con- 
cept of system, which was advocated long ago by von Bertallanffy (1968); 
we attempt to go from the concept of "influence" , which is mathematically 
defined, to the concept of "causal influence" , which has a physical meaning. 
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We make a clear distinction between the model for the system and the model 
for the observations, a classical distinction in automatics (Jazwinsky, 1970) 
but not in biostatistics. Moreover we link classical epidemiological models 
and mechanistic models; the latter are not generally taken into consideration 
in the literature of causal models although (or because) they make explicit 
use of scientific knowledge. 

The paper is organised as follows. In section 2 we develop a criterion 
of local independence in terms of measurability of processes involved in the 
Doob-Meyer representation; then we define direct and indirect influence. In 
section 3 we propose a definition of causal influence using the concepts of 
"physical system" and "physical laws" for which we propose a definition. 
Our framework makes it possible to link descriptive and explicative statistical 
models and encompasses the analysis of events and of quantitative processes. 
In section 4 we develop the distinction between the model for the system and 
the model for the observation. In section 5 descriptive and explicative joint 
models of HIV load and CD4 counts are considered. 

2 Local independence, direct and indirect in- 
fluence 

2.1 Notations 

Consider a filtered space (Q, J 7 , (J-'t), P) and a multivariate stochastic process 
X = (X t )t>o', X t takes values in dt m , and the whole process X takes values 
in D(JR. m ), the Skorohod space of all cadlag functions: 3? + — > 3? m . We suppose 
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that all the filtrations satisfy the usual conditions. We have X = (Xj,j = 
1, . . . , m) where Xj = (Xj t )t>o- We shall note Xj £ X. We denote by X t 
the history of X up to time t, that is X t is the a-field a(X u ,0 < u < t), 
and by (X t ) — {^t)t>o the families of these histories, that is the filtration 
generated by X. Similarly we shall denote by Xj t and (Xj t ) the histories and 
filtration associated to Xj. If C is a subset of (1, . . . , m) we shall call Xq the 
multivariate process (Xj, j £ C). 

2.2 Local independence, direct and indirect influence 

Let Tt = H. V X t ] Ti may contain information known at t — 0, in addition to 
the initial value of X. We shall consider the class of special semi-martingales, 
that is the class of processes which admit a unique Doob-Meyer decomposi- 
tion in the (Tt) filtration, under probability P: 

X t = A t + M t ,t>0, (1) 

where M t is a martingale and A t is a predictable process with bounded varia- 
tion. We shall denote the Doob-Meyer decomposition of Xf Xj t = Ajt + Mj t . 
We shall consider the non-degenerate case in which all the components of M 
are different from zero; the deterministic case will be studied in section 2.4. 
We shall assume two conditions bearing on the bracket process of the mar- 
tingale M: 

Al Mj and Mk are orthogonal martingales, for all j ^ k; 
A2 Xj is either a counting process or is continuous with a deterministic 
bracket process, for all j. 

We call T> the class of all special semi-martingales satisfying Al and 
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A2. The class of special semi-martingales is stable by change of absolutely 
continuous probability (Jacod and Shiryaev, 1987, page 43) and this is also 
true for the the class D. 

Definition 1 (Weak conditional local independence (WCLI)) X k is 

weakly locally independent of Xj in X on [0, r] if and only if A k is 
predictable on [0,r], where F-jt = 7~L V X-jt and X_j t = ViytjX-it- Equiva- 
lently we can say in that case that X k has the same Doob-Meyer decomposi- 
tion in (jF t ) and in (J 7 ^). We will note in that case Xj—j-^ x X k - 

Remark 1. Assumption A2 is necessary for the measurability-based 
definition of WCLI to be clearly interpreted. If we did not impose A2 we 
could find counter-examples in which a WCLI holds while intuitively indepen- 
dence does not hold. Such a counter-example is the process X = (Xi,X 2 ) 
which is the solution of the differential equation: dX\ t = a dt + b dWu; 
dX 2 t = Xu dt + e Xlt dW2t, Where W\ and W2 are Brownian motions. We 
would not like to say that X 2 is WCLI of X\. However, because X\ appears in 
the bracket process of X 2 , X u is included in X 2 t so that A 2 is X 2 r predictable 
and thus we would conclude that X 2 is WCLI of X\. 

Remark 2. It is tempting to define WCLI directly in terms of the con- 
ditional independence: 

X kt ± Xct _ >Xkt _X jt _,0<s<t<r. (2) 

Here X = (Xj,Xk,Xc)- However, this condition is void in general when 
we consider processes in continuous time. Because conditional independence 
is defined via conditional probability and in general, events of X kt will have 
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conditional probabilities equal to one or zero given Xkt-, the condition will 
always hold. It is possible that WCLI can be defined in terms of conditional 
independence of a-fields but this is an open problem. 

Definition 2 (Direct influence) We shall say that if X k is not WCLI of 
Xj in X, Xj directly influences Xk in X and we will note Xj — >x Xk- 

Definition 3 (WCLI and Direct influence for set of components) Let 

A, B subsets of (1, . . . , m) . We shall say that Xa — >x ^ B */ there is j E A 
and k £ B such that Xj — >x Xk ■ 

What we call here "direct influence" is the time-continuous analogue 
of Granger strong causality (Granger, 1969). We may consider another, 
stronger, condition of local independence. 

Definition 4 (Strong conditional local independence (SCLI)) Xk is 

SCLI of Xj in X if and only if Xj-/-* v- Xk and there is no Xp £ X such 
that Xj — >x Xd and X D — >x X k and we will note Xj^j-^ x^k 

Definition 5 (Influence) We shall say that if Xk is not SCLI of Xj, Xj 
influences (at least indirectly) Xk in X and we will note Xj — >j£ Xk- 

An interesting case is when weak independence holds but strong indepen- 
dence does not hold; equivalently Xj influences Xk but Xj does not directly 
influence X^. we shall say that Xj indirectly influences Xk- 

Definition 6 (Indirect influence) IfXj ^ k an d Xj-/-^ x Xk then 

there is Xq £ X such that Xj — >x Xq — ► jr Xk and we shall say that 
Xj indirectly influences Xk through Xq in X . 
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Remark. Since the Doob-Meyer decomposition depends on P so do 
all the independencies and influences; realising this fact is crucial for the 
definition of causal influence in section 3.1. 

2.3 Differential equation: towards causal interpreta- 
tion 

Writing the process of interest in the form of a stochastic differential equation 
(SDE) is a way of making the causal mechanisms at work more explicit . If 
A t is differentiable, the Doob-Meyer decomposition can be written: 

dX t = X t dt + dM t , (3) 

with A t = Jq \ u du. Differential equation models are commonly used in 
physics, biology and in finance (Oksendal, 2000) to model the evolution of X t 
as a function of the past plus a random term brought by the martingale. The 
two main cases, which have been considered in different streams of research, 
are the case where the trajectories of X are continuous and the case where X 
is a counting process. In the case of continuous trajectories of X it is common 
to take for M a Brownian martingale (in which case dM t = f(t)dWt, with 
W = (Wt) a Brownian motion), and the models considered are Ito processes. 
In the case where X is a counting process we write: 

dX t = \ t dt + dM t , (4) 

where M is a discontinuous martingale with predictable variation process 
equal to A, and A is called the intensity of the process. We may consider 
mixing the two cases, considering that X = (Xi,X 2 ), where Xi is an Ito 
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process and X 2 a counting process, each of these processes being possibly 
multivariate. The processes defined by these differential equations are not 
Markov in general. The Markov assumption is an interesting particular case 
and it is discussed in section 3.2.2. 

2.4 The deterministic case 

Ordinary differential equation (ODE) models seem to arise as particular cases 
in which M = 0. So one way to apply our definition of WCLI to deterministic 
models is to consider that these models are in fact stochastic but the martin- 
gale has a bracket process which takes small values in regard to A. Particular 
phenomena appear in purely deterministic models, in particular because the 
concept of filtration no longer applies. In that case the unicity of the dif- 
ferential equation is lost. For instance consider the process X = (X^A^); 
consider the case where the process X is deterministic and the trajectories 
are solutions of the ODE system: dXu = a dt; dX 2t = X\ t dt with initial 
conditions Xio = X 2 q = 0. The trajectories are also solutions of the ODE 
system: dX u = a dt; dX 2t = at dt with initial conditions X 10 = X 20 = 0. 
One would be tempted to say that X\ influences X 2 when looking at the 
first ODE system and but not when looking at the second one. The second 
ODE system however is not time-homogeneous. Unicity can thus be restored 
if we impose the restriction of time-homogeneity (see in section 3.2.2 a dis- 
cussion of the physical meaning of time- homogeneity). Taking advantage of 
the unicity of the time- homogeneous differential equation representation, we 
will consider it as the canonical representation, if it exists. We can then use 
the definition of WCLI for stochastic differential equations to define WCLI 
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for the deterministic case: construct a SDE system by adding to the ODE 
system orthogonal martingales with deterministic brackets. The influence 
graph of the time-homogeneous ODE system is, by definition, the same as 
that of the derived SDE. In the above example, if we add a standard Wiener 
martingale to the canonical (time-homogeneous) representation we obtain 
the SDE: dX\ t = a dt + dWu] dX2t = X\ t dt + dW^u in which it is clear that 
we have X± — >x -^2- 

2.5 Graph representation 

We may construct as in Didelez (2007) a directed graph representing influ- 
ences between components of X. This directed graph has for vertices the 
components Xj and there is a directed edge (j, k) if and only if Xj — >x ^-k- 
Note that there can be two directed edges between two vertices, for instance 
(j, k) and (k, j) ; this can be denoted by two arrows or by a double-sided arrow 
(< — >). A path is an ordered sequence of directed edges {(jo, ji), (ji, 32), • • • , (jk-i,jk)}- 
Indirect influence can be read directly off the graph: Xj ^ k ^ there 

is a path from j to k. An example is shown in figure 1 which represents 
the hypothetical influence graphs for processes X 1 on the left and X 2 on the 
right; the graphs are not acyclic and in particular we have X2 — ► j^-i X4 and 
X4 — > -g-i X 2 . We see also that Xi indirectly influences A 4 but does not 
influence A 3 , which we can note: Xi — >— A 4 and X\— >/-»• x 1 ^ 3 ' ^ ne 
graph for X 2 on the right may represent a richer system; we shall develop 
the issue of considering a family of nested systems in section 3.1. 

We say that Xq blocks the paths from Xi to X^ if all the paths from Xi 
to Xk contain a node in Xq- For instance A 4 blocks the paths from A 3 to 
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Figure 1: Example of two graphs from the same physical system. 

X 2 in X 1 . In X 2 there is no path from X 3 to X 2 , so X 4 still blocks the paths 
from X 3 to X 2 , although in a trivial manner. If there is a path from X[ to 
Xk and Xj blocks the paths from Xi to X^ there is necessarily a path from 
Xi to Xj and a path from Xj to X^, which can be expressed as: 

Lemma 1 (Decomposable influence) If Xi — >— >x m -^k an< ^ Xj blocks 
the paths from Xi to Xf. then X\ — >— >x m and Xj — >— >x m ^k- 

3 Causal influences 

3.1 Systems, causal influence 

In this section we outline a philosophical theory of causality; this theory is 
necessarily incomplete and questionable but we feel that a theory of this 
kind is necessary to link the mathematical definitions to the real world. The 
main concept used will be that of system and we will take two examples to 
illustrate this and other concepts: the first is the archetypal example of the 
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solar system; the second is the system formed by the immune system and a 
population of HIV viruses. Thus our first task is to define a system, that we 
will also call a "physical system" S in which we are interested. To define a 
system we admit that we can define a level at which relevant characteristics 
can be defined: we may distinguish a vector of attributes and a state vector. 
The attributes essentially define the system and do not vary in time; the 
state represents the characteristics, in general varying with time, in which 
we are really interested and will be represented by a multivariate stochastic 
process. We may decide for instance that the level we are interested in is that 
of the sun and the planets and their trajectories. A possible system may be 
identified by the sun and the nine planets; the attributes of the system are 
the masses of these ten celestial bodies; the state at time t is the vector of 
position and speed (in a reference system) of the ten celestial bodies. What 
we have excluded in defining this level are the details of the planets such as 
their physical structure, presence of life, particular events like storms and 
so on (see Batterman, 2002). In the Immune-HIV system example, we may 
decide that the level we are interested in is that of populations of cells or 
of HIV viral particles in a particular subject; the attributes describe which 
types of cells or viral particles are considered and characteristics of these 
populations if they may differ from one subject to another; the state may be 
the numbers (or concentrations) of different types of cells or of viral particles. 
At this level we are not interested in the fate of a particular cell. 

Now we suppose that we are particularly interested in one or several 
components of the state. We assume that there are laws which govern the 
evolution of the states and some of the laws tell us that the evolution of 
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the component j at time t depends on the component k just before t. New- 
ton's laws, including the gravitation law, tell us how to compute the force 
of attraction between two massive bodies. However it is impossible to find a 
system with two massive bodies completely isolated from the rest of the uni- 
verse; moreover it is very difficult to avoid the circularity in the definitions. 
For instance we have used the concept of system in the previous sentence, a 
concept which is still not defined. It would be tempting to define system by 
first defining causal influence. However in order to define causal influence 
we need to apply natural laws to a system. There is the problem of defining 
natural laws or physical laws. 

Definition 7 (System) A system is the couple S = (A, X) of attributes 
and state. The attribute A is a possibly random element with value in $l d 
which, together with the state, is sufficient to identify the system. The state 
X is a stochastic process from (Q, T) on (D(JR. m ), £), where Q is the universe 
and T contains all the events pertaining to the level of interest; D($l m ) is a 
Skorohod space of all cadlag functions: 9ft + — > ffl 71 , and E the Borel sigma- 
field derived from the Skorohod topology. 

We consider that deterministic A and X is a particular case of the 
stochastic case, with the reservation made in section 2.4 that WCLI is defined 
only for time- homogeneous ODE. Often the attribute will be considered as 
deterministic. In the solar system example, both attributes and states may 
be considered as deterministic, or we may consider it as random but work 
with a probability conditional to the observed value. The rationale for consid- 
ering attributes as random is that they are the results of systems of another 
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level: the existence of the planets is the result of the process of formation of 
the solar system. Note that even in that example, complex systems or long 
range predictions may raise the issue of chaos, thus introducing a stochastic 
feature (Murray and Dermott, 1999). 

Given a system S m = (A m ,X m ), we call jF t m the sigma-field generated 
by the attribute and the history of the state at time t, T™ = a(A m ) V 
o~(X™,0 < u < t) = A m V X™. It is important to consider several systems 
and in particular we may consider nested systems. A system S m ' is nested 
in S m if JF™' c for all t : S m ' can be enlarged by addition of attributes 
(A m ' C A m ) and/or addition of X m ' components (X t m ' C X t m ). We can 
consider a sequence of nested systems S = {S m } m>0 (we note S m G «S and 
S m C S m if m < m'). In this case, the family {^ r i m } m >o forms a filtration 
(for each t). If we consider a period of observation [0, r] (included in the 
definition of the level) we note T m = T™. Note that saying that S m C S m ' 
is more general than considering that all the components of S m belong to 
S m ' , although most result will refer to this case. 

From now on, we will speak about direct and indirect influences of Xj 
on Xk in the system S m (and denote Xj — ^™ X^ or Xj > >gm X^) these 
influences corresponds of the definitions of influences in X m in section 2 with 
H = A rn . 

We assume that there is a true probability law P* on (f2, T) and we 
denote its restriction to T m by P^m. We would like to approach P£ m by 
applying physical laws. We shall now endeavor to define physical laws. Let 
us first define mathematical laws. 

Definition 8 (Mathematical laws) Mathematical laws at a certain level 
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are a set of mathematical procedures that can be applied to any system S m of 
this level to build a probability P on T m . 

Generally the probability P 5 ™ will be different from P^ m . Suppose that we 
are particularly interested in a system S 1 , we may have to consider richer 
systems for making correct predictions for the system of interest. We define 
physical laws as yielding a probability that may be as close as we wish from 
Pjn., if we can apply them to a correct system. 

Definition 9 (Physical laws) If for any system S 1 of a given level, there 
exists a sequence of nested systems S = {S m }m>o including S 1 and mathe- 
matical laws such that P^T converges weakly toward P^i , these mathematical 
laws will be called physical laws at this level, and such a sequence S will be 
called an approximating sequence for S 1 . 

The weak convergence means that / g dP^-T — > / g dP^n for any T\- 
measurable continuous bounded function g on Q. We may also write dp{P^7 , P^i) 
0, where dp(., .) is the Prokorov metric for probability measures based on the 
Skorohod topology. The advantage of the Prokorov metric is that it metrizes 
weak convergence (Gibbs and Su, 2002) and it encompasses the deterministic 
case (which makes sense in the solar system example). In the deterministic 
case X takes the value X* with probability one under P^ and the value 
X sm under PjT and we have d P {P^T,P^i) = d(X s "\X*). 

We may postulate the existence of physical laws. This postulate reflects 
the asymptotic separability of the universe; that is, for making good predic- 
tions we do not need to take into account the whole universe, but on the 
other hand, application of the laws (even if we know the correct laws) never 
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leads to perfect prediction, partly because we have isolated a system from 
the rest of the universe. 

The systems may be more or less satisfactory according to the distance to 
the true probability achieved. For instance we would not call a set constituted 
of the Earth and Mars a satisfactory system; if we applied Newton's laws to 
this system we would see that the observed trajectories would be in large 
disagreement with the predicted ones; we would thus search for a better set 
of bodies, for instance the set (Sun, Earth, Mars). 

We have to make an assumption of finiteness of the approximating se- 
quence to have a clear definition of causal influence. We conjecture that this 
assumption could be avoided using a quantitative approach of WCLI but this 
is beyond the scope of this paper. 

A3. There is a perfect system S M for S l such that T 1 C T and 

pS M _ p* 
r-jn — r TX . 

This means that the probability law computed with the physical law 
applied to system S M coincides with the true law on the events of interest 
T 1 . 

We assume that Al and A2 hold for all the systems considered (see 
discussion in section 3.2.1); assuming A3 we can give the following definition. 

Definition 10 (Causal influence) A component j has a causal influence 
on a component k in S 1 if Xj — >— >$m X^ under P* , if S M is a perfect system 
for S 1 . 

Remark. The direct influences under the physical law are the same in all 
the systems and in particular in the perfect system; a direct influence under 
the physical law is thus always a causal direct influence. 
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Example: Solar system. If we consider a system (Earth, Moon) the 
law of gravitation tells us that the earth (in our presentation, the position 
of the Earth) has an influence on the trajectory of the moon; by definition 
(if we accept that the law of gravitation is a physical law), this is a causal 
influence. Even if this system is not completely satisfactory, the notable fact 
is that in any richer system, the Earth will have an influence on the Moon; 
this stability is characteristic of causal influences. 

Example: Immune system-HIV. The mechanisms which derive from 
the properties of HIV and CD4 lymphocytes are such that HIV can infect 
CD4 lymphocytes and that infected lymphocytes can produce viruses. The 
number of viruses produced depends in part on the number of infected lym- 
phocytes. Thus the component of the state "number of viruses" (in a given 
individual) has a causal influence on the component "number of infected 
lymphocytes". We can deduce the form of causal influences at the level of 
concentrations from knowledge of the mechanisms which lead to the replica- 
tion of the virus and application of diffusion laws. The approach is similar 
to Boltzman's theory of gases (see Strevens, 2005). 

The problem of estimation of the true law P* will be dealt with in section 

4. 

3.2 Stability of structures in sequences of systems 
3.2.1 Stability of the class T> 

Since influences are defined in the class T> and there is a need to consider 
sequences of systems, the stability of T> in the sequence is crucial. We have 
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the following Lemma: 

Lemma 2 (Stability of T> ) If X M e T>, then X m G T> for all X m C 
X M . 

Proof. The class of special semi-martingales is stable by change of filtra- 
tion (Jacod and Shiryaev, 1987). The optional square-bracket process does 
not depend on the filtration. Thus it remains deterministic for continuous 
processes (A2) and the martingales remain orthogonal (Al). For count- 
ing processes the orthogonality of the martingales holds if and only if the 
martingales cannot jump at the same time, which does not depend on the 
filtration; the martingales of a continuous and a counting process are always 
orthogonal. 

3.2.2 Instability of the homogeneous Markov property 

It is interesting to examine the Markov properties of the models. In the 
general case the derivatives of the predictable processes involved in the Doob- 
Meyer decomposition depend on the whole past of the process. For instance 
we can make these dependencies explicit by writing X(t) = X(t,X u ,0 <u< 
t). In Markov models these functions depend only on the present, or more 
precisely on X t -. \(t,X u ,0 < u < t) = X(t,X t -)- The model is (time)- 
homogeneous if these functions do not depend on time: \{t,X t J) = A(X 4 _). 
Typical physical models are time-homogeneous Markov models. The Markov 
property means that knowledge of the past cannot improve our knowledge 
of the future if we know the present, and the homogeneous property means 
that the laws of the universe we have used for constructing the model do not 
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change. So one can argue that if the model we consider is not a homogeneous 
Markov model we have omitted important components in the model. 

If a process is time-homogeneous Markov in S M under P* this does not 
in general hold for S m C S M ; thus the homogeneous Markov property is 
not stable in T). This fact explains why it is often needed to consider non- 
homogeneous and even non-Markovian models in biology; indeed the systems 
considered are often oversimplified in view of the complexity of the real sys- 
tems, leading to a loss of the time-homogeneous Markov property. 

3.2.3 Faithfulness and stability of influences 

To go further, we assume that P* is "faithful" , a property which is discussed 
for instance in Robins et al. (2003) for directed acyclic graphs, and that we 
define in our context as: 

Definition 11 (Faithful probability) A probability P is faithful for a se- 
quence S if for any S m ' , S m E S such that S m ' C S m and such that XJ 1 = 
XJ 1 = Xj and X™ = X™' = Xk, we have Xj — Xk implies Xj — > sm > 
Xk- Equivalently, Xj-/-^ Sm i Xk implies Xj—/-^ $ m Xk- 

Figure 1 illustrates which is compatible with P being faithful: if 

S 1 (resp. S 2 ) has the left (resp. right) influence graph, we see that the weak 
independence between X\ and X3 is stable when the system is enriched from 
S l to S 2 ; on the other hand the influence of X 3 on X 4 disappears (X 6 acts 
as a confounder process); finally the direct influence of X\ on X 2 becomes an 
indirect influence through X$. Faithfulness does not hold in general; however 
one may argue that it does not hold only in very specific cases. We show for 
instance in Appendix A that 
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Proposition 1 (faithful-diffusion) If the system S M — (A , X ) is such 
that A AI = {0, tt} and X M is a linear time-homogeneous diffusion pro- 
cess under P, then faithfulness holds for any sequence of nested systems 
S={S\..., S M ), where S m = {A m , X m ) is a system such that X m e X M . 

Even in the true probability the influences for X may be non-causal. 
However, with the faithfulness assumption two conclusions can be drawn: (i) 
if X\ -^^ s ™' X2 then either this influence is causal or one can find a S m , 
S m ' C <S m G <S, in which there is a process which influences both X\ and X2 
(a common ancestor in graph terminology): such a process may be called a 
confounder in epidemiological terminology; (ii) if X\ > / > sm >X 2 this means 
that X\ does not have a causal influence on X%. If an indirect influence in 
S m ' is causal it is stable by considering richer systems S m ; direct influences 
in S m ' may be related to indirect causal influences in S rn . 

Now we study criteria of independence of processes, which leads us to 
a mathematical proof in our context of the causal interpretation of a direct 
influence of a randomized process (our Theorem 1). 

Definition 12 (Dynamical independence) IfXj > / > s m ^k, X^—*)-* s m ^j 
and Xj and X^ have no common ancestor, we say that Xj and X^ are dy- 
namically independent in S m . 

Definition 13 (Non-influenced process) A process Xa € X m is a non- 
influenced process in S m = (A m , X m ) for probability P if Xj—/-+ 5™ Xa for 
P, for allXj E X m . 

Lemma 3 (Component and group Dynamical independence) If Xj and 

Xk are dynamically independent in S m = (A m ,X m ), it is possible to find 
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Xa,Xb,Xc such that X m = (Xa, Xb, Xc) where Xa and Xb are non- 
influenced in S m . Conversely any component, say Xj, of Xa and any com- 
ponent of Xb, say Xk, are dynamically independent in S m . 

If Xc is influenced by both Xa and Xb, the influence graph of X m = 
(Xa, Xb, Xc) where Xa and Xb are non-influenced is Xa — > Xc < — Xb 

Lemma 4 (Dynamical independence and independence) Let X m = 

(Xa, Xb, Xc) ■ Consider the assumptions : (a) Xc—f-^ s m Xa andXc—j-^ s m Xb, 
(b) P is faithful for the sequence S = (S 2 ,S m ) with S 2 = ( A, X 2 ) with 
X 2 = (Xa,Xb) and (c) the decomposition of (Xa,Xb) in S m is that of a 
diffusion with jumps such that given A the corresponding SDE satisfies the 
uniqueness conditions in law. 

Consider the two following propositions : 

(i) Xa and Xb are independent conditionally on A; 

(ii) X B — ►/-> s™X A , X A ^h s™X B and X A o -U-a X B o- 

Then, under assumptions (a) and (b), (i) implies (ii). Moreover if (c) holds, 
the converse is true. 

Remark. Diffusion with jumps and conditions of uniqueness are given 
in Jacod and Shiryaev (III. 2). In proposition (ii) Xao and Xbo are the initial 
values of Xa and Xb- 

Although the Lemma may seem intuitively obvious a general proof is not 
simple to find. See Appendix B for an outline of proof. 
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Definition 14 (5-Non-influenced process) Let a system S l belonging to 
a sequence S; a process I G X 1 is a S -non-influenced process for probability 
P if whatever S m C5,J1 A m and X™-/-^ Sm I for P, for all j. 

The only clearly non-influenced processes for P* are randomised pro- 
cesses, generally randomised attribution of a treatment. In observational 
studies, the non-influenced quality will always be an assumption. For in- 
stance a genetic factor may in some circumstances be considered as non- 
influenced (Didelez and Sheenan, 2007). However, in our approach genetic 
factors would generally be considered as (observed or non-observed) at- 
tributes, and not part of the state. 

Theorem 1 (Non-influence and causality) Let S an approximating se- 
quence for S 1 = (A , (I,Xj)). Suppose that P* is faithful for any sequence 
in the associated perfect system S M , that (I,Xj) satisfies the assumption (a) 
and (c) of Lemma 4 and Iq JL^m Xjo for all m. If I is a <S -non-influenced 
process for P* and I — > s i Xj for P* , then I causally influences Xj. 

Proof. If / did not causally influence Xj, we would have J > / > ^mXj for 
P*. Since / is a non-influenced process, according to Lemma 3 and Lemma 4, 
/ -ILj^m Xj and using the fact that I _1L A m for all m it implies / _0__4i Xj, and 
in particular that I—f-* s 1 Xj, in contradiction with our assumption. Hence 
the Theorem. 

It is interesting to give a version of the idea of instrumental variables 
(Stock, 2001; Angrist et al., 1996; Greenland, 2000) applied to our context; 
here the idea is applied only to assess the causal nature of an influence, while 
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it is often used to estimate the magnitude of the causal influence in specific 
models. We have the following result: 

Lemma 5 (Instrumental processes) Under the assumptions of Theorem 
1, if I is a S -non-influenced process, I — >$i X k , and Xj blocks the paths 
from I to X k in system S M , then Xj causally influences X k . 

Proof. By Theorem 1 we have I — ► ji Xf. ==>- / X k . If Xj blocks 

the paths from / to X k in X M , then by Lemma 1 we have Xj — >— > s m X k \ 
hence the Lemma. 

3.3 Implications for physics, system biology and epi- 
demiology 

3.3.1 Physics 

Let us consider as an example the level of the trajectories of planets in the 
solar system. The physical system is the set of planets simplified to points 
in three-dimensional space and we are interested only in their trajectories. 
The state of the system can be represented by a multivariate process X, 
the components of which are the positions and the speeds of the planets 
in a given set of axes and this process obeys a differential equation of the 
type dXt = g(X t )dt, where g{.) is a function derived from Newton's law 
of mass attraction. X is a time-homogeneous Markov process, although 
degenerated because deterministic. There do not seem to be processes that 
can be manipulated in this system. However we believe that the influence of 
a planet on the trajectory of another planet may be considered as being of 
causal nature. 
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A first instance of the application of physical laws is to predict or to 
control the state of the system: for instance one can predict eclipses or 
control the trajectory of a space vessel. In this case we assume that we know 
the physical law and that we have a good system. 

A second instance is that there is a discrepancy between P* and P s for 
the chosen system. If there is not much doubt about the physical laws we are 
applying (here Newton's laws) then it may be deduced that the system con- 
sidered is not satisfactory and that it must be increased. A famous example 
of such an instance is the discrepancy which appeared between the computed 
and the observed trajectories of Uranus. Leverrier made computations which 
lead to the discovery of Neptune in 1846. He assumed that the discrepancy 
in the observed trajectory of Uranus with respect to what was computed 
using Newton's laws was due to the presence of another planet: he gave the 
computed position of this planet to Johann Galle and Louis D 'Arrest who 
found it. 

A third instance occurs if in spite of refining the system, a discrepancy 
persists. Then the physical laws may be cast into doubt. 

3.3.2 Systems biology 

The model is constructed with partially known mechanisms but some of the 
influences are unknown and even when causal influences are assumed, their 
precise forms are unknown. These models can be used to test whether some 
causal influences exist or to quantify them when they are assumed to exist. 
We will develop the analysis of the interaction between HIV and the immune 
system in section 5. 
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3.3.3 Epidemiology 

Most epidemiological studies endeavour to test the influence of a single factor 
on a disease process. The physical system contains all biological phenomena 
implied in the disease as well as the factor of interest; in general there is no 
physical law, only biological plausibility of some causal influences. A typical 
system is X = (F, D, C) where F is the factor of interest, D represents the 
disease and C are other processes taken from the system. Such a problem 
is most often modelled with random variables rather than with stochastic 
processes. The stochastic process framework allows to take into account the 
dynamics of the phenomena: typically D would be a counting process and 
the exposure factor F may also vary in time, as is most often the case in 
reality. The interest often lies in the possible causal influence of F on D. 
Testing whether F — >j£ D is generally expressed by saying that we test 
whether F is a risk factor for D by an analysis adjusted on C. It should be 
possible to formmalize in our framework the condition of "no unmeasured 
confounders" which makes it possible to conclude that F causally influences 
D. This however requires further work. 

In many simple clinical trials the main interest lies in a particular in- 
fluence, that of a drug on a clinical endpoint. The aim is to test whether 
there is a causal influence without trying to understand which basic causal 
mechanisms may explain it, even if there is a biological plausibility that a 
certain molecule (or treatment) may have a causal influence on the clinical 
endpoint considered. That is, most often, we do not have physical laws. This 
is why randomised trials have been developed. If F is a treatment that can 
be randomised, it becomes <S-non-influenced. Then by Theorem 1 it is suffi- 
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cient to look at the influence of F on D in any model to deduce the presence 
or absence of causal influence. 

4 Model for the observations 

In most applications we do not have precise physical laws. Instead of a unique 
probability we use a model, that is a family of probability (P/ m )6» e e on T m . 
The choice of the model may include scientific knowledge, that is a model can 
be considered as an incompletely specified physical law. If the system S m is 
rich enough (ideally if it is the "perfect" system S M ) and if the knowledge 
incorporated in the model is correct, the model is well-specified, that is Pj£- m G 
(P^ m )e G 0. Even if the model is not well specified it is interesting to find the 
value #o such that P^ m is the closest to P£ m . Since the latter is unknown 
we need observations, which by definition are realisations of jF m -measurable 
random variables under probability P_£ m . Generally complex systems will 
be observed with complex observations schemes, leading to incomplete (or 
coarsened) or indirect observations. Generalising the approach of Heitjan 
and Rubin (1991) to stochastic processes we may say that the observation, 
represented by the sigma-field O m , are generated by g(X,G), where G is a 
component which may be deterministic or stochastic. If G is deterministic 
we have O m C T m ] if however G is random, O m is not a subset of T m . 

To choose a probability in the model close to P£ m we must construct an 
estimator 9{O m ). For maximum likelihood or maximum penalised likelihood 
estimators we must compute the likelihood for the observation, which is the 
Radon-Nykodim derivative of Pg™ relative to a reference probability Pq on 
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the sigma- field O m , and we denote it . If the mechanism leading to 
incomplete data (m.l.i.d.) is deterministic this is equal to Ep (£^ m \O m ) 
and this is relatively easy to compute. If not, the issue of ignorability of 
the m.l.i.d. arises: if the m.l.i.d. is ignorable we can proceed as if it was 
deterministic and obtain nevertheless the correct inference. For instance 
Commenges and Gegout-Petit (2007) computed the likelihood for counting 
processes observed with a complex, but ignorable, observation scheme. If 
the m.l.i.d. is not ignorable we have to include G in the system and consider 
X m = (X m , G); we have then by definition O m ' C T m ' and we can apply the 
above formula. The price to be paid is that we need additional assumptions 
and the computation of the likelihood may not be easy. 

In epidemiology one generally has a sample of observations for a sample of 
systems indexed by z, i — 1, . . . ,n. The most common framework is that the 
observations are independently identically distributed. In this framework, if 
we can describe the system and its observation for a generic item, we can do 
it for the sample; this is why in this paper we always omit the subscript i. 

5 Dynamical models for HIV/ AIDS 

5.1 The problematic of AIDS through dynamic influ- 
ence graphs 

AIDS was identified in 1981 as a life-threatening disease due to acquired 
immunodeficiency. It was found that this immunodeficiency was essentially 
due to a decrease of the number of CD4+ T-lymphocytes. In 1983 it was 
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found that this decrease was mainly due to the destructive replication of a 
virus in CD4+ lymphocytes and this virus was denominated HIV. Thus we 
can formulate the causal pathway: "presence of HIV causes low CD4 counts 
which causes AIDS which causes death". Although most researchers would 
agree with this phrase and think that what is behind the word "cause" are 
particular biological mechanisms which could be further reduced to biochem- 
ical laws, it remains vague because i) time is only implicitly involved through 
the fact that cause precedes effect; ii) each modality is relative to another 
modality (presence vs absence, low vs high and so on). 

The dynamical model approach allows us to make the causal statement 
more precise. First we construct the processes 7 = (7 t ), T = (T t ), A = (A t ), 
D = (D t): 7 is a counting process representing HIV infection, T has a 
continuous state-space and represents CD4+ T- lymphocytes count; A and 
D are counting processes representing AIDS and death respectively. We can 
express the causal structure by the influence graph: 

7 — >T — > A — > D 

Indeed we know from the results of research (involving virology, immunology, 
and clinical research) that these influences can be interpreted as causal. It 
is interesting to note that we consider that 7 — > T is causal although it is 
difficult to manipulate 7. A more detailed description of the infection can be 
made by introducing the viral load process V = (VJ). There is of course a 
direct influence of 7 on V because if It = then dVt = 0. When considering 
the evolution of infected subjects the process of interest is V (not 7 which is 
identically equal to one in these patients). 
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5.2 From descriptive to mechanistic models 

In the conventional epidemiological and biostatistical literature, linear mixed- 
effect models have been used to analyse separately repeated measurements 
of CD4 counts and viral load. For instance to analyse viral load following 
initiation of a highly active anti-retroviral therapy (HAART) a linear-mixed 
effect model with two slopes has been used (Jacqmin-Gadda et al., 2000). 
Potential observations Yj are the viral load at time tj, or a logarithmic trans- 
formation of the viral load; for simplicity we will ignore these normalising 
transformations here. Some data may be missing (a non-ignorable mecha- 
nism here): Yj was observed only if Yj > 77, where 77 is a detection limit, 
while lYj>n was always observed. The model can be written as: 

Yj = (3o+a +((3 1 +a 1 +j 1 A)mm(tj,U) + ((3 2 +a 2 +^2A)(tj-U)I tj>u +ej, (5) 

where fio, j3\, fo, are parameters for the intercept, first and second slopes 
respectively and do, Oi, 0,2, are independent normal random effects on the 
intercept, first and second slopes respectively; t* is the time of change of 
slope (supposed known), A indicates the treatment and Ej are normal vari- 
ables with zero expectations: they may be independent or have a correlation 
structure. In the dynamical model representation, this model can be written 
in terms of the process V = (V t ) living in continuous time, representing the 
concentration of virus at time t. There are at least two ways of representing 
the random effects: they could be degenerate components of the state or they 
could be random attributes. We adopt the latter which leads to the simplest 
expression: 

dV t = [(# + nA)I t < u + ($ + l2 A)I t>u ]dt + adW t , with Z = $ (6) 
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where (3' is a random initial condition and f3[ and f3' 2 are considered as random 
attributes; the link with the above model is that /?'• has expectation f3j and 
variance var aj. The observation (treating the observation times as fixed) 
is O = cr{lY j>v , lY j>v Yj,j = 1, . . . ,m) where Yj = V tj + e'y Note that the 
error Ej of model (5) is the sum of the value of the martingale at tj and the 
observation error in model (6): we have Ej = Wt 4 + £j- The models for the 
observations may be the same if the correlation structure of the Ej in model 
(5) is compatible with that produced by model (6). The graph of this process 
is not very interesting since only A influences Z. 

A more elaborate model was proposed by Thiebaut et al. (2005). This 
was a multivariate linear mixed model for jointly modelling viral load and 
CD4, together with a possibly informative drop-out. For each of the two 
markers there were two slopes with a fixed and a random effect (as in the 
previous model). We leave aside here a certain number of features of that 
paper, including modelling of the drop-out and of explanatory variables, to 
focus on how the link between observations of HIV load and CD4 counts was 
modelled. The model can be written: 

Y. 1 = ft + al+ {(3\ + a\) minfa, U) + {(3\ + a\){tj - U)I tj>u + e], 

Y? = (3l + a 2 + (ft + aj) minfo, Q + {(3 2 2 + a\){t 3 - U)I tj>u + e). 

where and e 2 are zero expectation normal variables. For fixed j, and e 2 
are independent; the sequences £j,j = 1, . . . , m for k — 1,2 may be formed of 
independent variables or have a correlation structure. The link between HIV 
load and CD4 counts was expressed by correlations of the random effects 
a} and a 2 , I = 0,1,2. In particular we could expect negative correlations 
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between the slopes of HIV load and CD4 counts, which was indeed observed 
when fitting the model to the data of a therapeutic trial (better viral response 
was correlated to better immune response). 

The model can be expressed in the dynamical framework as: 

dV t = \$I t <u + $h>u\dt + vidWu, with V = $ 

df t = [f3[ 2 I t < u + /%I t> t.]dt + a 2 dW 2u with f = f3' 2 

where V t is the logarithm of the viral load and T t the CD4 counts at time t, 
A' fc = Pi + of , ^ = 1, 2; / = 1, 2. As in the previous model there are several 
ways of treating the random effects; for instance we may consider them as 
random attributes. The observation is 

O = a(l Y ^ ly^F/, Yf.j 1 m) with Y} = V tj + e'} ; Yf = f t . + ef 

(7) 

It is clear from the differential equations above that there is no influence 
of V on T whatever the values of the parameter: the influence graph is made 
of two disconnected vertices. We might have treated the random effects 
as ancestors, but in this representation too, there is no direct nor indirect 
influence of V on T. In this model T is SCLI from V which does not fit with 
the known mechanism of the infection. So although this model succeeded 
in fitting the data better than separate linear mixed models, it is unable to 
capture any relevant causal influence. 

There are different models in which we can express that viral load in- 
fluences CD4. Having made a clear distinction between the "model for the 
system" and the "model for the observation" it is natural to construct a 
model including components that are not observed at all, but which will be 
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more satisfying with respect to the way it represents the biological mech- 
anisms. One may distinguish infected and un-infected cells and take into 
account the causal influences in the ODE system (Ho et al., 1995; Perelson 
et al., 1996). Still a more satisfying model distinguishes between quiescent 
(<5)and activated CD4 (T) and between infectious (Vj) and non- infectious 
(Vni) virus. Note that distinguishing quiescent and activated CD4 is a way 
of enriching the state without simply adding a new component. To write the 
differential equation for the model one uses additional assumptions which are 
plausible in view of the knowledge of the biological mechanisms: for instance 
we assume that new CD4+ T lymphocytes are produced (by the thymus) 
at a rate A, that only activated cells can be infected, that the probability of 
meeting of a cell and a virion is proportional to the product of their concen- 
trations. The model proposed by Guedj, Thiebaut and Commenges (2007) 
was: 



dQ t = (A + pT t - aQ t - fi Q Q t )dt 

dT t = (aQ t - (1 - rjl {I RT =l} )^T t V It - pT t - yL T T t )dt 

dT t * = [(1 - r]l {I R T=1} )-fT t V It - fi T *T t *}dt 

dVn = (ufi T *7rT t * - fi v V It )dt 

dV NIt = [(1 - u)[i T *TiTt - fi v V NIt \dt 



where I RT is the process indicating whether a treatment based on an in- 
hibitor of the reverse transcriptase is taken at time t. If we consider the 
framework of a controlled clinical trial this process is non-influenced and 
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controlled (because its trajectory is obtained by randomisation). Guedj, 
Thiebaut and Commenges (2007) assumed that some parameters were ran- 
dom. Such parameters may be considered as random attributes while fixed 
parameters may be considered as constants of a "physical law". Note that 
the system is time-homogeneous, which is satisfactory from an explanatory 
point of view. Moreover, as we noted in section 2.4, this makes it possibile 
to draw the influence graph of a deterministic model. We could also consider 
a stochastic differential equation system but inference in this context is very 
challenging. The observation is the same as in (7), with V tj = Vi^ + Vniu 
andT t . =Q tj +T tj +T?.. 

We could consider mixing this model for the markers with a model for 
an event such as an opportunistic disease, adding the component D = (D t ) 
which is a counting process. The risk of the opportunistic disease may be 
considered as depending on the concentration of CD4+ T lymphocytes, so 
that keeping the framework of a time- homogeneous Markov model we can 
propose a proportional hazard model (but with constant base-line risk 7): 

dD t = / {Dt _ =0} 7exp(/3iQt + fcT t + foZ) + dM t , 

where Z is an explanatory variable. The graph for such a model is given 
in Figure 2. 

Note that if the treatment was an inhibitor of protease the graph would 
be different: the inhibitor of protease influences Vj and Vnj. Also, in an 
observational study, the treatment is in fact influenced by the information 
on the clinical and biological state of the patient. If we want to represent 
this situation we have to include the medical doctor in the system: the 
doctor may decide to modify the treatment after having been informed of 
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Figure 2: Graph for the mechanistic HIV model. 

the measurement of viral load (VL) and of CD4 counts (CD4); note that the 
processes VL and CD4 and are different from V and T because they carry 
the information on measurements of these processes, that is (V t , T t ) carry the 
observation contained in O (see (7)) up to time t. Then the graph could be as 
shown in Figure 3, where we have represented by dotted lines the influences 
of the marker processes on their measurements and the influence of these 
measurements on the treatment, through the decision of the doctor. 
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Figure 3: Graph for the mechanistic HIV model. 
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Appendix A: Faithfulness in diffusion processes 

We study the faithfulness property in the case of a system of linear diffusions. 
For sake of simplicity we consider a process X 3 , X 3 = (X u , X 2t , X 3t ) where 
X±, X2, X 3 are univariate processes. Let us define the processes X 3 by the 
following linear stochastic differential equations with constant coefficients: 

dX lt = {a 1 X lt + b 1 X 2t + c 1 X 3t )dt + dW lt 
< dX 2t = (a 2 X lt + b 2 X 2t + c 2 X 3t )dt + dW 2t (8) 
dX 3t = (a 3 X lt + b 3 X 2t + c 3 X 3t )dt + dW 3t 

with initial conditions X w = X 20 = X 30 = and (W\, W 2 , W 3 ) are inde- 
pendent Brownian motions. We are interested in the semi-martingale de- 
composition of X\ = (X lt ,X 2t ) in its own filtration (X u V X 2t ). If we note 
i^L^lA^ V X 2 t[ = X 3t and using the innovation theorem, we find that: 

dX u = (cnXn + b!X 2t + Cl X 3t )dt + dW lt + ci{X 3t - X 3t )dt 

" v ' 

=dM u (q\ 

dX 2t = (a 2 X lt + b 2 X 2t + c 2 X 3t )dt + dW 2t + c 2 (X 3t - X 3t )dt 

V v ' 

=dM 2 t 

M u and M 2t are independent Brownian motions in the filtration (X u V X 2t ). 
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If we suppose that the coefficient bi ^ , the probability would not be 
faithful, if X 2 —j-^> that is b\X 2t + c\X 3t = f(X u ). We use the linear 

filtering equations given in (Pardoux, 1991): 



dX: 



3/ 



dR t 



X u (a 3 - Rt{a x ci + a 2 c 2 )) + X 2t (b 3 - Rt{hci + b 2 c 2 )) + X 3t (c 3 - R t {c\ + cj)) dt 
+R t c 1 dX lt + R t c 2 dX 2t (10) 
(203^ + 1-^(^ + 4))^ (11) 



A necessary condition in order to delete the dependence of X 1 towards X 2 is 
that the part directed by dW 2 in biX 2t +ciX 3t equals 0, that is (b\ = —R t cic 2 ). 
If we remark that R t which is a solution of the Riccati differential equation 
(11), cannot be constant, we conclude that the model is faithful. 
Now if we suppose that the coefficients are no longer constant and are de- 
terministic time functions and if we suppose that the following relation is 
true 

6i(t) = -R tCl (t)c 2 (t) (12) 

The part driven by dX 2t in (i(t)X 2t + ci(t)X 3t ) disappears. According to (12) 
and noting Z t = bi(t)X 2t + ci(t)X 3t , (for convenience, we sometimes omit the 
dependence of the coefficients (bi(t),b 2 (t),b 3 (t),Ci(t),c 2 (t),c 3 (t)) on t) 

dZ t = h{t)dX 2t + Cl {t)dX 3t + (b[(t)X 2t + c[(t)X 3t )dt 

= ci(t) X lt (a 3 - RtiatCi + a 2 c 2 )) + X 2t (b 3 - R t {biCi + b 2 c 2 )) + X 3t (c 3 - R t {c\ + c\)) 



dt 



+R tCl dX lt + \(b[(t)X 2t + c[(t)X 3t ) 



dt 



X ltCl (t)(a 3 - R t (a lCl + a 2 c 2 )) + X 2t [ci(t)(b 3 - R t (b lCl + b 2 c 2 )) + b[(t)} 



+X : 



3/ 



ci(*)(c 3 - R t (c\ + c 2 2 )) + c[(t) dt + Rt^dX 



-it 
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Z t is the solution of a stochastic differential equation only driven by X u if 



ci(t) [ci(t)(b 3 - R t (b lCl + 6 2 c 2 )) + b[(t)] = h(t) [ci(t)(c 3 - Rt{c\ + c£)) + c[(t) 

Using (12) to substitute bi(t) we can show that if 63 = i?j(&2C2 + c' 2 + c 2 — 
C2C3) + -R 2 iC 2 , Z t is driven by X lt and the property of faithfulness falls. 
This case is extreme. In fact if it holds, the dynamic of bi(t), 6 2 (^) and 63^) 
is imposed by those of ci(t), c 2 (t) and cs(t) 



Appendix B: Proof of Lemma 4 

Proof: 

Let us first prove that (i) implies (ii). Consider the Doob-Meyer decom- 
position of Xa in the filtration A V X At : X At = ^At + M At . By (i), we 
have E[M At - M As \A V X As V X Bs ] = E[M At - M As \A V X As ] and thus the 
Doob-Meyer decomposition of Xa is the same in the filtrations (A V X At ) 
and (A V X At V X Bt ). This implies X B ^j-^ s 2 X a . By symmetry, we have 
Xa— ►/-► 52X5 and (ii) follows in 1S 2 . Now by the faithfulness property, we 
have (ii) in all system S m with S 2 C S m . 

As for the converse, we prove it in the case of a process satisfying a SDE 
governed by a Brownian motion in [T t ) with Tt = A V Af^ V <Ye t : 

A At = X A0 + f f(X As ,a )ds + [ a As dW As (13) 
Jo Jo 

X B t = X B0 + [ g(X Bs ,f3 )ds+ [ a Bs dW Bs (14) 
Jo Jo 

where (a ,/3 ) is *4.-measurable, a a and a B are deterministic (A2) and Wa 
and IVb are two independent Brownian motions (Al). We suppose that 
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given (a , flo) G A, the SDE satisfies assumption assuring uniqueness in law 
(see for instance Revuz and Yor, 1991: Definitions IX. 1.3 and IX. 1.4 and 
Corollary IX. 1.14 for the conditions). As by assumption, X A and X B are 
non-influenced in (A, X m ), then whatever the system S m ' = (A,X m ) such 
as X 2 C X m ' C X m the process (X A ,X B ) always satisfies the same SDE. 
However, we can take a new probability space (Q', T') endowed with two 
independent Brownian motions W A > and W B > and construct two independent 
processes X A , t -X A , Q and X m -X B , on it with (X A > , X B 'o) = c (X A0 , X B0 ) 
and with X A> satisfying SDE (13) driven by W A n and X B ' satisfying SDE (14) 
driven by W B 't- By the first part of this demonstration, the decomposition of 
(X A r, X B i) in X A i\J X B i is given by the joint system of the 2 equations satisfied 
by X A i and X B i in her own filtration. The vector (X A ,X B ) and (X A /,X B >) 
satisfies the same SDE, by uniqueness in law this implies the conditional 
independence between X A and X B given A. 

Using the same reasoning one can extend the result to any diffusion system 
with jumps (defined in Jacod and Shiryaev, 1987: p. 155) satisfying the 
condition of uniqueness in law given A (see theorem III. 2. 32 in Jacod and 
Shiryaev, 1987). 
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